Automatic Recognition of Composite Verb Forms in Serbian

نویسنده

  • Bojana Dordevic
چکیده

In this paper, we will present the work on building a shallow parser for recognizing composite verb forms in Serbian – the forms that consist of an auxiliary verb and a main verb. The parser is made in Unitex, a corpus processing software, in the form of local grammars that rely on using morphological dictionaries of Serbian. The model was tested on a small corpus of texts, both written in Serbian and translated into Serbian (total of 171 kw), in a few phases. In the current phase, the average result of 95,8% of well recognized units is achieved, with the translation of Jules Verne’s Around the world in 80 days giving the best results (98,8%), and a short story by Ivo Andrić, A Vacation in the South, giving the worst (91,7%).

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Composite Tense Recognition and Tagging in Serbian

The technology of finite-state transducers is implemented to recognize, lemmatize and tag composite tenses in Serbian in a way that connects the auxiliary and main verb. The suggested approach uses a morphological electronic dictionary of simple words and appropriate local grammars.

متن کامل

A Framework for Automatic Acquisition of Croatian and Serbian Verb Aspect from Corpora

Verb aspect is a grammatical and lexical category that encodes temporal unfolding and duration of events described by verbs. It is a potentially interesting source of information for various computational tasks, but has so far not been studied in much depth from the perspective of automatic processing. Slavic languages are particularly interesting in this respect, as they encode aspect through ...

متن کامل

Dimensionality Reduction and Improving the Performance of Automatic Modulation Classification using Genetic Programming (RESEARCH NOTE)

This paper shows how we can make advantage of using genetic programming in selection of suitable features for automatic modulation recognition. Automatic modulation recognition is one of the essential components of modern receivers. In this regard, selection of suitable features may significantly affect the performance of the process. Simulations were conducted with 5db and 10db SNRs. Test and ...

متن کامل

Local Grammars and Compound Verb Lemmatization in Serbo - Croatian

The increasing production of electronic (digital) texts (either on the Web or in other electronically available forms, such as digital libraries or archives) demands appropriate computer tools that can help human users in text manipulation and, additionally, in performing automatic processing of language resources. In the first place, a natural language processing (NLP) system needs to implemen...

متن کامل

Speech Technologies for Serbian and Kindred South Slavic Languages

This chapter will present the results of the research and development of speech technologies for Serbian and other kindred South Slavic languages used in five countries of the Western Balkans, carried out by the University of Novi Sad, Serbia in cooperation with the company AlfaNum. The first section will describe particularities of highly inflected languages (such as Serbian and other language...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012